Generalize fused weight split #57

skyw · 2025-10-13T22:04:44Z

Decided to remove split fused parameters logic altogether, because how parameters are fused/stacked together is implementation dependent, it is hard to generalize for everything.
Instead, now it provides interface to plugin more sophisticated orthogonalize function and let user control how to split.

copy-pr-bot · 2025-10-13T22:04:47Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Signed-off-by: Hao Wu <[email protected]>

skyw · 2025-10-13T23:37:46Z

Feels like it may be heading the wrong direction, there will be more cases to support. Alternative is just take an orthoganize function and let users control how to split inside, as well as scale function and all rest of it.

@FDecaYed @mkhona-nvidia let me know what do you think?

skyw · 2025-10-13T23:38:03Z

/ok to test c178725

FDecaYed

LGTM

mkhona-nvidia · 2025-10-15T06:28:10Z

emerging_optimizers/orthogonalized_optimizers/orthogonalized_optimizer.py

+            split_grads_whitened = [self.orthogonalize_fn(g) for g in split_grads]
+            split_grad_scales = [self.scale_factor_fn(g.size(0), g.size(1)) for g in split_grads]
+
+            # TODO(skyw): Revisit whether there are cases that concatenating is not done along dim=0.


nn.conv1d (https://docs.pytorch.org/docs/stable/generated/torch.nn.Conv1d.html) has 3d filter and so the output has to be reshaped to 3d

Valid point, that's also one more reason to let user supply orthogonalize function altogether instead of trying to generalize for everything.

Although conv specifically is a completely different case, all rest code assumes 2d, the scale function for example.

Signed-off-by: Hao Wu <[email protected]>

skyw · 2025-10-15T21:05:03Z

/ok to test 27aa01c

Signed-off-by: Hao Wu <[email protected]>

skyw · 2025-10-17T18:13:58Z

/ok to test bd33fbb

mkhona-nvidia

LGTM

skyw added 2 commits October 13, 2025 15:05

update fused param handling

a215643

Signed-off-by: Hao Wu <[email protected]>

update callable hint

c178725

Signed-off-by: Hao Wu <[email protected]>

skyw force-pushed the skyw/generalize_fused_weight branch from 11e032a to c178725 Compare October 13, 2025 22:05

skyw requested a review from FDecaYed October 13, 2025 22:05

copy-pr-bot bot temporarily deployed to test October 13, 2025 23:38 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci October 13, 2025 23:40 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci October 13, 2025 23:44 Failure

copy-pr-bot bot temporarily deployed to nemo-ci October 13, 2025 23:44 Inactive

FDecaYed previously approved these changes Oct 14, 2025

View reviewed changes

mkhona-nvidia reviewed Oct 15, 2025

View reviewed changes

skyw added 2 commits October 15, 2025 13:43

get rid off split fn altogether

faee14c

Signed-off-by: Hao Wu <[email protected]>

comment update

27aa01c

Signed-off-by: Hao Wu <[email protected]>

skyw dismissed FDecaYed’s stale review via 27aa01c October 15, 2025 20:43

copy-pr-bot bot temporarily deployed to test October 15, 2025 21:05 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci October 15, 2025 21:09 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci October 15, 2025 21:12 Inactive

Update docstring and add example to OrthogonalizedOptimizer

0ed453a

Signed-off-by: Hao Wu <[email protected]>

mkhona-nvidia marked this pull request as ready for review October 17, 2025 18:12

Merge branch 'main' into skyw/generalize_fused_weight

bd33fbb

copy-pr-bot bot temporarily deployed to test October 17, 2025 18:14 Inactive

mkhona-nvidia approved these changes Oct 17, 2025

View reviewed changes

skyw enabled auto-merge (squash) October 17, 2025 18:17

copy-pr-bot bot temporarily deployed to nemo-ci October 17, 2025 18:31 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci October 17, 2025 18:39 Inactive

skyw merged commit d6b8fa6 into main Oct 17, 2025
14 checks passed

skyw deleted the skyw/generalize_fused_weight branch October 17, 2025 18:54

skyw mentioned this pull request Oct 17, 2025

Generalize split fused weight, QKV for example #56

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Generalize fused weight split #57

Generalize fused weight split #57

Uh oh!

skyw commented Oct 13, 2025 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Oct 13, 2025

Uh oh!

skyw commented Oct 13, 2025

Uh oh!

skyw commented Oct 13, 2025

Uh oh!

FDecaYed left a comment

Uh oh!

mkhona-nvidia Oct 15, 2025

Uh oh!

skyw Oct 15, 2025

Uh oh!

skyw commented Oct 15, 2025

Uh oh!

skyw commented Oct 17, 2025

Uh oh!

mkhona-nvidia left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Generalize fused weight split #57

Generalize fused weight split #57

Uh oh!

Conversation

skyw commented Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

copy-pr-bot bot commented Oct 13, 2025

Uh oh!

skyw commented Oct 13, 2025

Uh oh!

skyw commented Oct 13, 2025

Uh oh!

FDecaYed left a comment

Choose a reason for hiding this comment

Uh oh!

mkhona-nvidia Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

skyw Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

skyw commented Oct 15, 2025

Uh oh!

skyw commented Oct 17, 2025

Uh oh!

mkhona-nvidia left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

skyw commented Oct 13, 2025 •

edited

Loading